Protein-dependent prediction of messenger RNA binding using Support Vector Machines

نویسنده

  • Carmen Maria Livi
چکیده

RNA-binding proteins interact specifically with RNA strands to regulate important cellular processes. Knowing the binding partners of a protein is a crucial issue in biology and it is essential to understand the protein function and its involvement in diseases. The identification of the interactions is currently resolvable only through in vivo and in vitro experiments which may not detect all binding partners. Computational methods which capture the proteindependent nature of the binding phenomena could help to predict, in silico, the binding and could be resistant against experimental biases. This thesis addresses the creation of models based on support vector machines and trained on experimental data. The goal is the identification of RNAs which bind specifically to a regulatory protein. Starting from a case study, done with protein CELF1, we extend our approach and propose three methods to predict whether an RNA strand can be bound by a particular RNA-binding protein. The methods use support vector machines and different features based on the sequence (method Oli), the motif score (method OliMo) and the secondary structure (method OliMoSS). We apply them to different experimentally-derived datasets and compare the predictions with two methods: RNAcontext and RPISeq. Oli outperforms OliMoSS and RPISeq affirming our protein specific prediction and suggesting that oligo frequencies are good discriminative features. Oli and RNAcontext are the most competitive methods in terms of AUC. A Precision-Recall analysis reveals a better performance for Oli. On a second experimental dataset, where negative binding information is available, Oli outperforms RNAcontext with a precision of 0.73 vs. 0.59. Our experiments show that features based on primary sequence information are highly discriminative to predict the binding between protein and RNA. Sequence motifs can improve the prediction only for some RNA-binding proteins. Finally, we can conclude that experimental data on RNA-binding can be effectively used to train protein-specific models for in silico predictions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels

The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...

متن کامل

Prediction of Ligand Binding sites in RNA binding protein Pockets using support vector machines

RNA-binding proteins play a significant role in pattern regulation of gene expression during developmental phases. Therefore in order to facilitate our understanding of organism development there is a continuous need to develop an extensive a priori method for the prediction of RNA-binding protein pockets. We present here a SVM (Support Vector Machine) based approach for successful prediction o...

متن کامل

Prediction of RNA-binding proteins from primary sequence by a support vector machine approach.

Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzy...

متن کامل

Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence.

Classification of gene function remains one of the most important and demanding tasks in the post-genome era. Most of the current predictive computer methods rely on comparing features that are essentially linear to the protein sequence. However, features of a protein nonlinear to the sequence may also be predictive to its function. Machine learning methods, for instance the Support Vector Mach...

متن کامل

Toward a Systematic Definition of Protein Function That Scales to the Genome Level: Defining Function in Terms of Interactions

The ultimate goal of functional genomics is to elucidate the function of all the genes in the genome. However, the current notions of function are crafted for individual proteins. The degree to which they can scale to the genomic level is not clear. In this paper, we review the diverse approaches to functional classification, focusing on their ability meet this challenge of scale. Our review em...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013